Alicia Miles
This document provides a clear and simple guide to network detection and visualization in R. Below, I leverage the igraph, visNetwork, and DiagrammeR packages to effectively detect and visualize transaction networks.
The transaction data used in this example was generated using the randomNames package. The below code can be used to analyze almost any financial institution transaction datasets as the only variables used are originator, beneficiary, and transaction amount. More generally, the below code is quite useful when analyzing almost any dataset in which there is a sender and recipient. Other applications include email communication and social media activity.
Begin by loading the relevant packages and the transaction dataset.
library(dplyr)
library(igraph)
library(magrittr)
library(visNetwork)
library(DiagrammeR)
library(data.table)
mydata <- fread("~/Desktop/R/Data/Network Analysis Transaction Data.csv", header = T, stringsAsFactors=FALSE)
Next, create an igraph graph object.
graph <- graph.data.frame(mydata, directed=F)
Then simplifying the graph to remove loops and multiple edges. During the simplification step, I compute summary statistics for the combined edges by summing the edge weights (which corresponds to counterparty-pair total volume) and counterparty-pair total principal.
E(graph)$weight <- 1
graph <- simplify(graph, edge.attr.comb=list(weight = "sum", transaction_amount = "sum", function(x)length(x)))
Next, use the clusters command to calculate the connected components of the graph and assign the cluster id as a vertex attribute.
networks <- clusters(as.undirected(graph))
V(graph)$network <- networks$membership
Then convert the igraph graph into a data frame containing the vertex and cluster id and merge the data frame with the original data.
nodes <- get.data.frame(graph, what="vertices")
dt <- data.table(merge(mydata, nodes, by.x=c("originator"), by.y=c("name")))
The above steps enable me to easily and efficiently obtain node and edge attribute information when creating the visNetwork and DiagrammeR graph objects below.
Now for the graph.
While network detection is quite simple, effective network visualization can be quite challenging when working with large transaction datasets in which there can be many individuals in a network and frequent, overlapping network connections. In addition, most analyses on the internet that explain how to visualize and analyze networks discuss more advanced network analysis techniques than one really requires when visualizing and analyzing a transaction dataset.
In addition, certain packages make it difficult to identify the individual nodes and connections, I have found that visNetwork and DiagrammeR are two of the most effective packages when one needs to effectively visualize and analyze transaction networks. Ultimately, the most useful type of network visual may depend on the size and structure of the network, as well as one’s goals.
The visNetwork package is truly a revelation. The package enables the user to easily and efficiently visualize networks as well as select nodes in order to highlight clients and their counterparties in the larger network graph.
Below, I create a visNetwork network graph.
nodes <- data.frame(id = nodes$name, title = nodes$name, group = nodes$network)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
visNetwork(nodes, edges) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)%>%
visGroups(groupname = "1", color = "maroon")
Not only can the user select the relevant node using visNetwork’s nodesIdSelection command, she can also zoom in and out in order to explore different parts of the graph. Moreover, the layout of the visNetwork network graph is clean and clear compared to other interactive network graphs.
As noted above, the most useful type of network visualization and analysis may depend on the size and structure of the network, as well as one’s goals. While it is hard to beat visNetwork’s network visualization capabilities, DiagrammeR does enable the user to add originator-beneficiary pair aggregate transaction totals and other information to the network graph. In addition, DiagrammeR has additional features that enable one to isolate high-risk clients, perform network analysis, and work with network graphs. Moreover, DiagrammeR provides the user with an extraordinary amount of control over node and edge attributes. As is true with visNetwork, DiagrammeR makes it possible to visualize a ‘clean’ network and truly see the individual nodes and connections. See http://rich-iannone.github.io/DiagrammeR/graphs.html for additional details.
When using DiagrammeR, I personally prefer the following layouts:
Below I use DiagrammeR functions, strung together with the magrittr %>% pipe, to create and render network graphs that use the circo and visnetwork layouts. The create_graph function creates a dgr_graph object. The render_graph function - which enables the user to both visualize the network(s) and create ouput files - requires a dgr_graph object, created using the create_graph function.
I prefer using DiagrammeR’s circo layout for smaller networks graphs with overlapping edges and complexity. To illustrate the circo layout, I will subset the smallest network in the transaction dataset.
dg <- decompose.graph(graph)
net_1 <- dg[which(networks$csize == min(networks$csize))][[1]]
Then, I create the nodes and edges. The type and rel attributes for nodes and edges, respectively, are optional but important for any data modelling work.
nodes_df1 <- create_nodes(nodes = unique(V(net_1)$name), type = "person", color = "gray")
edges_df1 <- create_edges(from = get.edgelist(net_1)[,1],
to = get.edgelist(net_1)[,2],
rel = E(net_1)$transaction_amount,
color = "maroon")
edges_df1$rel = as.numeric(edges_df1$rel)
Next, I create a network graph using the smallest network and the circo layout.
graph_attrs <- c("layout = circo")
graph1 <- create_graph(nodes_df = nodes_df1, edges_df = edges_df1, graph_attrs = graph_attrs) %>%
set_graph_name("network1") %>%
set_global_graph_attr("graph","output","circo")%>%
set_global_graph_attr("graph", "principal", sum(edges_df1$rel))
graph1 %>% render_graph(width = 1000, height = 500)
Below I create a DiagrammeR visnetwork network graph using the largest network in the transaction dataset.
net_2 <- dg[which(networks$csize == max(networks$csize))][[1]]
nodes_df2 <- create_nodes(nodes = unique(V(net_2)$name), type = "person", color = "gray")
edges_df2 <- create_edges(from = get.edgelist(net_2)[,1],
to = get.edgelist(net_2)[,2],
rel = E(net_2)$transaction_amount,
color = "maroon")
edges_df2$rel = as.numeric(edges_df2$rel)
graph2 <- create_graph(nodes_df = nodes_df2, edges_df = edges_df2) %>%
set_graph_name("network2") %>%
set_global_graph_attr("graph","output","visNetwork")%>%
set_global_graph_attr("graph", "principal", sum(edges_df2$rel))
graph2 %>% render_graph(width = 1000, height = 500)
DiagrammeR provides a convenient means to work with multiple graphs using a graph series object. The time and sequence properties of the series graphs can be used for subsetting. I will create an empty graph series object and then add each network graph to the series.
series <- create_series(series_type = "sequential", series_name = "series")
series <- graph1 %>% add_to_series(series)
series <- graph2 %>% add_to_series(series)
If the network graph is extremely large, then it may be necessary to explore other options, such as graphing and analyzing communities (subnetworks) within the network rather than the entire network. Below I detect the communities within the larger networks, create a graph series of communities or subnetworks in the transaction dataset, then use DiagrammeR’s render_graph_from_series function to render the second community in the graph series.
First, I detect the communities using the fast and greedy community detection algorithm.
fc <- fastgreedy.community(graph)
Next, I create a network graph for each community and add each graph to a graph series object so that the graph series object will contain all community graphs.
community_series <- create_series(series_type = "sequential", series_name = "community_series")
for(g in unique(membership(fc))){
subg<-induced.subgraph(graph, which((membership(fc)==g) & ( sizes( fc)[[g]]!=1)))
nodes_sub <- create_nodes(nodes = unique(V(subg)$name), type = "person", color = "gray")
edges_sub <- create_edges(from = get.edgelist(subg)[,1],
to = get.edgelist(subg)[,2],
rel = E(subg)$transaction_amount,
color = "maroon")
graph_sub = create_graph(nodes_df = nodes_sub, edges_df = edges_sub) %>%
set_graph_name(paste0("community_", toString(g)))%>%
set_global_graph_attr("graph",
"output",
"visNetwork")
community_series <- graph_sub %>% add_to_series(community_series)
}
To render the second community in the graph series, I simply use DiagrammeR’s render_graph_from_series function and specify that I want to render the fourth sequential graph in the series, which I named ‘community_4’ in the loop above.
community_series$community_4 %>% render_graph_from_series(graph_series = community_series,
graph_no = 4)
While network visualization is extremely useful when analyzing transaction data, visuals are really just the tip of the iceburg. Several functions in the DiagrammeR and igraph packages allow the user to easily obtain general network graph information, as well as specific node and edge information about the current state of the graph object.
Upon identifying a large network with suspicious network structures in a transaciton dataset network graph, it is necessary to obtain basic network statistics, including node count and edge count. Below, I will examine network2 in the graph series object, which corresponds to the second network in the igraph network graph as I have not reordered the networks during the above network visualization.
I begin analyzing the nodes by combining graph1 and graph2 to create a single graph object and analyze all nodes at once.
all_graphs <- combine_graphs(graph1, graph2)
DiagrammeR’s node_info function enables the user to obtain information about each node, including label, type, degree, indegree, outdegree, and loops.
node_info <- node_info(all_graphs)
Using node_info and dplyr::filter is an efficient way to determine which nodes are highly connected in this graph. In the context of transaction analysis, this enables the analyst to determine which clients have a large number of counterparties and isolate those clients who send and/or receive transactions from a large number of counterparties. The actual number of counterparties that could indicate suspicious activity or a need for futher investigation would depend on factors such as the transaction dataset start and end date.
First, I will identify highly connected nodes, or individuals who have over ten counterparties.
highly_connect <- filter(node_info(all_graphs), degree > 10)$node
Next, I will isolate those individuals in the transaction dataset who either send or receive transactions from over five counterparties.
high_indegree <- filter(node_info(all_graphs), indegree > 5)$node
high_outdegree <- filter(node_info(all_graphs), outdegree > 5)$node
To obtain all transaction counterparties in an igraph object, simply use the neighbors function.
neighbors <- V(graph)$name[neighbors(graph, 4)]